First approach to the selection of lexical units for continuous speech recognition of Basque
نویسندگان
چکیده
The selection of appropriated Lexical Units is an important issue in the Language Model (LM) generation. Word has been used classically as unit in most of the Continuous Speech Recognition systems. However, during the last years proposals of non-word units have begun to appear. Since Basque is an agglutinative language with a certain structure inside the word, the nonword units could be an adequate option. In this work, a statistical analysis of the morphological structure of Basque has been carried out. This analysis shows a slight increment of the rates of confusion in Continuous Speech Recognition Systems due to the great increment of acoustically similar and short units. Finally several proposals of Lexical Units are analyzed to deal with the problem.
منابع مشابه
Automatic Morphological Segmentation for Continuous Speech Recognition of Basque
The selection of appropriate Lexical Units (LUs) is an important issue in the development of Continuous Speech Recognition (CSR) systems. Word has been used classically as unit in most of them. However, proposals of non-word units have begun to arise. Since the subject of this study is the Basque language, which is an agglutinative language with a complex structure inside words, non-word units ...
متن کاملSelection of Lexical Units for Continuous Speech Recognition of Basque
The selection of appropriate Lexical Units (LUs) is an important issue in the development of Continuous Speech Recognition (CSR) systems. Words have been used classically as the recognition unit in most of them. However, proposals of nonword units are beginning to arise. Basque is an agglutinative language with some structure inside words, for which non-word morpheme like units could be an appr...
متن کاملSelection of sublexical units for continuous speech recognition of basque
This paper describes the work carried out to select the most suitable set of Sublexical Units for Continuous Speech Recognition of Basque. Even if there are several dialects in Basque, only one of them has been used to choose the preliminary set of sounds. Bearing in mind this aim, a wide experimentation has been carried out to select Context Independent Phone-Like Units. Then, in order to obta...
متن کاملDecision Tree-Based Context Dependent Sublexical Units for Continuous Speech Recognition of Basque
This paper presents a new methodology, based on the classical decision trees, to get a suitable set of context dependent sublexical units for Basque Continuous Speech Recognition (CSR). The original method proposed by Bahl [1] was applied as the benchmark. Then two new features were added: a data massaging to emphasise the data and a fast and efficient Growing and Pruning algorithm for DT const...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کامل